660 research outputs found

    Evaluation of Automatic Video Captioning Using Direct Assessment

    Full text link
    We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for any given video clip there is no definitive ground truth or correct answer against which to measure. Automatic metrics for comparing automatic video captions against a manual caption such as BLEU and METEOR, drawn from techniques used in evaluating machine translation, were used in the TRECVid video captioning task in 2016 but these are shown to have weaknesses. The work presented here brings human assessment into the evaluation by crowdsourcing how well a caption describes a video. We automatically degrade the quality of some sample captions which are assessed manually and from this we are able to rate the quality of the human assessors, a factor we take into account in the evaluation. Using data from the TRECVid video-to-text task in 2016, we show how our direct assessment method is replicable and robust and should scale to where there many caption-generation techniques to be evaluated.Comment: 26 pages, 8 figure

    Automatic skin segmentation for gesture recognition combining region and support vector machine active learning

    Get PDF
    Skin segmentation is the cornerstone of many applications such as gesture recognition, face detection, and objectionable image filtering. In this paper, we attempt to address the skin segmentation problem for gesture recognition. Initially, given a gesture video sequence, a generic skin model is applied to the first couple of frames to automatically collect the training data. Then, an SVM classifier based on active learning is used to identify the skin pixels. Finally, the results are improved by incorporating region segmentation. The proposed algorithm is fully automatic and adaptive to different signers. We have tested our approach on the ECHO database. Comparing with other existing algorithms, our method could achieve better performance

    A framework for sign language recognition using support vector machines and active learning for skin segmentation and boosted temporal sub-units

    Get PDF
    This dissertation describes new techniques that can be used in a sign language recognition (SLR) system, and more generally in human gesture systems. Any SLR system consists of three main components: Skin detector, Tracker, and Recognizer. The skin detector is responsible for segmenting skin objects like the face and hands from video frames. The tracker keeps track of the hand location (more specifically the bounding box) and detects any occlusions that might happen between any skin objects. Finally, the recognizer tries to classify the performed sign into one of the sign classes in our vocabulary using the set of features and information provided by the tracker. In this work, we propose a new technique for skin segmentation using SVM (support vector machine) active learning combined with region segmentation information. Having segmented the face and hands, we need to track them across the frames. So, we have developed a unified framework for segmenting and tracking skin objects and detecting occlusions, where both components of segmentation and tracking help each other. Good tracking helps to reduce the search space for skin objects, and accurate segmentation increases the overall tracker accuracy. Instead of dealing with the whole sign for recognition, the sign can be broken down into elementary subunits, which are far less in number than the total number of signs in the vocabulary. This motivated us to propose a novel algorithm to model and segment these subunits, then try to learn the informative combinations of subunits/features using a boosting framework. Our results reached above 90% recognition rate using very few training samples

    TRECVID 2007 - Overview

    Get PDF

    Creating a web-scale video collection for research

    Get PDF
    This paper begins by considering a number of important design questions for a web-scale, widely available, multimedia test collection intended to support long-term scientific evaluation and comparison of content-based video analysis and exploitation systems. Such exploitation systems would include the kinds of functionality already explored within the annual TRECVid benchmarking activity such as search, semantic concept detection, and automatic summarisation. We then report on our progress in creating such a multimedia collection which we believe to be web scale and which will support a next generation of benchmarking activities for content-based video operations, and we report on our plans for how we intend to put this collection, the IACC.1 collection, to use

    TRECVID 2008 - goals, tasks, data, evaluation mechanisms and metrics

    Get PDF
    The TREC Video Retrieval Evaluation (TRECVID) 2008 is a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last 7 years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. In 2008, 77 teams (see Table 1) from various research organizations --- 24 from Asia, 39 from Europe, 13 from North America, and 1 from Australia --- participated in one or more of five tasks: high-level feature extraction, search (fully automatic, manually assisted, or interactive), pre-production video (rushes) summarization, copy detection, or surveillance event detection. The copy detection and surveillance event detection tasks are being run for the first time in TRECVID. This paper presents an overview of TRECVid in 2008

    Multimodal Classification of Urban Micro-Events

    Get PDF
    In this paper we seek methods to effectively detect urban micro-events. Urban micro-events are events which occur in cities, have limited geographical coverage and typically affect only a small group of citizens. Because of their scale these are difficult to identify in most data sources. However, by using citizen sensing to gather data, detecting them becomes feasible. The data gathered by citizen sensing is often multimodal and, as a consequence, the information required to detect urban micro-events is distributed over multiple modalities. This makes it essential to have a classifier capable of combining them. In this paper we explore several methods of creating such a classifier, including early, late, hybrid fusion and representation learning using multimodal graphs. We evaluate performance on a real world dataset obtained from a live citizen reporting system. We show that a multimodal approach yields higher performance than unimodal alternatives. Furthermore, we demonstrate that our hybrid combination of early and late fusion with multimodal embeddings performs best in classification of urban micro-events

    An Investigation into the Labor Market Behavior and Characteristics of Emirati Unemployed

    Get PDF
    The strong and robust growth of the United Arab Emirates (UAE) over the past decade has significantly raised the standards of living in the country, and has created remarkable economic and social transformations. However, there is some concern that strong output growth has yet to translate into an equivalent growth of jobs for UAE citizens, particularly outside the public sector and among young nationals. A careful estimate shows that the number of unemployed Emiratis by the end of 2011 is 34750, of which 72 percent are women, and 65 percent are youth. Among the youth, the percentage of unemployed females is 70 percent. In 2010 the Emirati unemployment rate was estimated at 14 percent; 8 percent among males and 28 percent among females. In 2011, the unemployment rate is estimated at 12.8%; the highest unemployment rate is in Al Fujairah (19.5%) followed by Abu Dhabi (15.1%) and the lowest rate estimated in Dubai at 7%

    HLVU : A New Challenge to Test Deep Understanding of Movies the Way Humans do

    Full text link
    In this paper we propose a new evaluation challenge and direction in the area of High-level Video Understanding. The challenge we are proposing is designed to test automatic video analysis and understanding, and how accurately systems can comprehend a movie in terms of actors, entities, events and their relationship to each other. A pilot High-Level Video Understanding (HLVU) dataset of open source movies were collected for human assessors to build a knowledge graph representing each of them. A set of queries will be derived from the knowledge graph to test systems on retrieving relationships among actors, as well as reasoning and retrieving non-visual concepts. The objective is to benchmark if a computer system can "understand" non-explicit but obvious relationships the same way humans do when they watch the same movies. This is long-standing problem that is being addressed in the text domain and this project moves similar research to the video domain. Work of this nature is foundational to future video analytics and video understanding technologies. This work can be of interest to streaming services and broadcasters hoping to provide more intuitive ways for their customers to interact with and consume video content

    The Impact of the Labor policy on demographics

    Get PDF
    As a country that relies heavily on imported workers, the impact of the United Arab Emirates\u27 (UAE) labor policies on demographics cannot be overstated. The number and the types of workers admitted into the UAE every year, and the duration of their stay, directly affect the demographic profile of the nation\u27s population in terms of size, growth, age, gender, race, health, nationality, as well as socioeconomic status like education and income. Policies that continue to encourage the importing of young, uneducated and low-paid workforce from abroad would only exacerbate the existing gender and ethnic imbalance in the population; as such workers tend to be male, single, and coming from a few south Asian countries. By contrast, labor policies that encourage the use of more skilled knowledge workers are more likely to bring in people from more diverse ethnic backgrounds and with more balanced distribution across gender and age. Labor policies also affect demographics through their impact on marital and family relationships, as higher-paid workers are more likely to bring their families to the UAE or start one in the country than low-paid laborers are. The impact of labor policies on demographics of local population is significant too, most likely through their impact on female employment and costs of living, which subsequently affect local people\u27s marriage patterns and fertility rates
    corecore